Towards a Better Detection of Horizontally Transferred Genes by Combining Unusual Properties Effectively
نویسندگان
چکیده
BACKGROUND Horizontal gene transfer (HGT) is one of the major mechanisms contributing to microbial genome diversification. A number of computational methods for finding horizontally transferred genes have been proposed in the past decades; however none of them has provided a reliable detector yet. In existing parametric approaches, only one single compositional property can participate in the detection process, or the results obtained through each single property are just simply combined. It's known that different properties may mean different information, so the single property can't sufficiently contain the information encoded by gene sequences. In addition, the class imbalance problem in the datasets, which also results in great errors for the gene detection, hasn't been considered by the published methods. Here we developed an effective classifier system (Hgtident) that used support vector machine (SVM) by combining unusual properties effectively for HGT detection. RESULTS Our approach Hgtident includes the introduction of more representative datasets, optimization of SVM model, feature selection, handling of imbalance problem in the datasets and extensive performance evaluation via systematic cross-validation methods. Through feature selection, we found that JS-DN and JS-CB have higher discriminating power for HGT detection, while GC1-GC3 and k-mer (k = 1, 2, …, 7) make the least contribution. Extensive experiments indicated the new classifier could reduce Mean error dramatically, and also improve Recall by a certain level. For the testing genomes, compared with the existing popular multiple-threshold approach, on average, our Recall and Mean error was respectively improved by 2.81% and reduced by 26.32%, which means that numerous false positives were identified correctly. CONCLUSIONS Hgtident introduced here is an effective approach for better detecting HGT. Combining multiple features of HGT is also essential for a wider range of HGT events detection.
منابع مشابه
Integration of horizontally transferred genes into regulatory interaction networks takes many million years.
Adaptation of bacteria to new or changing environments is often associated with the uptake of foreign genes through horizontal gene transfer. However, it has remained unclear how (and how fast) new genes are integrated into their host's cellular networks. Combining the regulatory and protein interaction networks of Escherichia coli with comparative genomics tools, we provide the first systemati...
متن کاملTowards more robust methods of alien gene detection
Because the properties of horizontally-transferred genes will reflect the mutational proclivities of their donor genomes, they often show atypical compositional properties relative to native genes. Parametric methods use these discrepancies to identify bacterial genes recently acquired by horizontal transfer. However, compositional patterns of native genes vary stochastically, leaving no clear ...
متن کاملHorizontal Gene Transfer : Effect and Affect on Computational
Through the efforts of the human genome project, genes horizontally and laterally gene transferred have been implicated as the source of bacterial protein homologs in the human genome. Additionally, similar protein homologs have been identified in archeal and bacterial species. Controversy continually surrounds the estimations of numbers of horizontally transferred genes, the time and distance ...
متن کاملTowards a Simpler Photoautotrophic Cell - Conserved and Variable Genes in Synechococcus Elongatus
Simpler biological systems should be easier to understand and engineer. One way to achieve biological simplicity is through genome minimization. Here we have looked for genomic islands in the fresh water cyanobacterium Synechococcus elongatus PCC 7942 that could be used as targets for deletion for genome minimization. By using a combination of methods we have identified 184 genes that have been...
متن کاملA computational tool for the genomic identification of regions of unusual compositional properties and its utilization in the detection of horizontally transferred sequences.
Similarity Plot (S-plot) is a Windows-based application for large-scale comparisons and 2-dimensional visualization of compositional similarities between genomic sequences. This application combines 2 approaches widely used in genomics: window analysis of statistical characteristics along genomes and dot-plot visual representation. S-plot is effective in identifying highly similar regions betwe...
متن کامل